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Introduction 

The rapid increase in available computational power 
over the last decade has enabled higher resolution flow 
simulations and more widespread use of unstructured 
grid methods for complex geometries. While much 
of this effort has been focused on steady-state cal- 
culations in the aerodynamics community, the need 
to accurately predict off-design conditions, which may 
involve substantial amounts of flow separation, points 
to the need to efficiently simulate unsteady flow fields. 
Accurate unsteady flow simulations can easily require 
several orders of magnitude more computational ef- 
fort than a corresponding steady-state simulation. For 
this reason, techniques for improving the efficiency 
of unsteady flow simulations are required in order 
to make such calculations feasible in the foreseeable 
future. The purpose of this work is to investigate pos- 
sible reductions in computer time due to the choice 
of an efficient time-integration scheme from a series of 
schemes differing in the order of time-accuracy, and 
by the use of more efficient techniques to solve the 
nonlinear equations which arise while using implicit 
time-integration schemes. This investigation is carried 
out in the context of a two-dimensional unstructured 
mesh laminar Navier-Stokes solver. 

Implicit in any comparison of efficiency is a pre- 
cise error tolerance requirement. For stringent accu- 
racy requirements, high-order temporal discretization 
schemes are well known to be superior to lower or- 
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der (e.g. second-order) schemes, due to their superior 
asymptotic properties. However, for engineering cal- 
culations, where larger error tolerances (O(10 -2 ) - 
0(1O~ 3 )) are generally acceptable, second-order ac- 
curate time discretizations are currently the method 
of choice, and higher-order methods are generally 
shunned due to their increased cost per time step. Re- 
cently, the use of higher-order accurate implicit Runge- 
Kutta schemes has been shown to produce efficiency 
gains even for relatively coarse error tolerances using 
a production structured-mesh Navier-Stokes solver. 1 * 

In this paper, we perform a similar investigation 
within an unstructured mesh setting. Additionally, 
we investigate the efficiency of various non-linear so- 
lution techniques for solving the non-linear problems 
which arise at each time step for the various time dis- 
cretizations considered. We consider three solution 
techniques, namely, a non-linear multigrid method 
which solves the non-linear problem directly through 
pseudo-time-stepping, and two variants of an inexact 
Newton scheme, where the linear system at each New- 
ton iteration is partially solved using a linear multi- 
grid scheme or a multigrid preconditioned GMRES 
approach. Because high-order time discretizations 
achieve high temporal accuracy with relatively large 
time steps, thus increasing the condition number of 
the non-linear problem, the use of efficient non-linear 
solvers takes on additional significance in such cases. 

Non-linear multigrid methods were originally de- 
veloped for steady-state fluid flow problems and 
subsequently adapted to unsteady flow problems. 2 4 
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Newton-based methods have often been avoided in 
this context due to the additional memory overheads 
incurred by such methods and the difficulties in provid- 
ing reliable initializations for non-linear convergence. 
However, Newton-based methods have been shown 
to offer the potential for higher computational effi- 
ciency by avoiding frequent non-linear residual eval- 
uations. 5 Furthermore, the disadvantages of Newton- 
based methods are less relevant in the context of an 
unsteady flow solver, where a close initial solution is al- 
ways available from the previous time step, and where 
memory considerations are often secondary to cpu- 
tiine considerations. 

In this paper, we illustrate the potential savings 
achieved using higher-order time discretizations and 
more efficient non-linear solvers for unsteady flow sim- 
ulations on unstructured grids. We investigate the in- 
teraction between the time-discretization scheme and 
the non-linear solution technique as a function of tem- 
poral accuracy and show that the beneficial effects are 
multiplicative, producing up to an order of magnitude 
savings in computational effort. 

Base Solver 

Spatial Discretization 

For the purpose of comparison, an existing 
two-dimensional unstructured multigrid steady-state 
Navier-Stokes solver developed in 6 was modified to sim- 
ulate transient flows by incorporating various physical 
time-stepping schemes. The flow equations are dis- 
cretized using a finite-volume approach. Flow vari- 
ables are stored at the vertices of the mesh, and control 
volumes are formed by the median-dual graph of the 
original mesh, as shown in Figure 1. A control- volume 
flux balance is computed by summing fluxes evalu- 
ated along the control volume faces, using the average 
values of the flow variables on either side of the face 
in the flux computation. The construction of convec- 
tive terms corresponds to a central difference scheme 
which requires additional dissipation terms for stabil- 
ity. These may either be constructed explicitly as a 
blend of a Laplacian and biharmonic operators, or may 
be obtained by writing the residual of a standard up- 
wind scheme as the sum of a convective and dissipation 
term: 

neighbors 1 

5Z 2{ F ( w i) + F ( w k)}n ik 

fc= 1 

- ^ I A ik | (w L - w R ) (1) 

where the convective fluxes are denoted by F (w), nik 
represents the normal vector of the control volume face 
separating the neighboring vertices i and k , and 
is the flux Jacobian evaluated in the direction nor- 
mal to this face, wl and wr represent extrapolated 


flow values at the left- and right-hand sides of the con- 
trol volume face respectively. A matrix-based artificial 
dissipation scheme is obtained by utilizing the same 
transformation matrix | | as the upwind scheme, 

but using this to multiply a difference of blended first 
and second differences rather than a difference of re- 
constructed states at control- volume boundaries. For 
the calculations performed in this work, which involve 
only subsonic flows, the matrix dissipation formed us- 
ing only second differences has been used exclusively, 
and the physical viscous terms for the Navier-Stokes 
equations are discretized to second-order accuracy us- 
ing a finite- volume approximation. 



Fig. 1 Median control- volumes for triangular 
meshes 

Temporal Discretization 

Time is discretized in a fully implicit sense using 
both multistep Backward Difference Formulas (BDF) 
and multistage Runge-Kutta(RK) schemes. There are 
two mathematical properties that are desirable of a 
numerical integrator. The first is the “A-stability” 
property which guarantees that all eigenvalues lying in 
the left half of the complex plane will have an ampli- 
fication of no more than 1, independent of the chosen 
step size. Hence, the only restriction on the time step 
with an A-stable scheme is the consideration of solu- 
tion accuracy. The second is the u L-stable” property 
which guarantees that eigenvalues approaching — oo 
are damped in one time step. 

Multi step BDF formulas, and in particular the 
second-order accurate BDF scheme (BDF2), are 
widely used in computation of large scale engineering 
flows. These schemes require only one nonlinear set of 
equations to be solved at each time step. They suf- 
fer, however, from not being self- starting, are difficult 
to use with variable time steps, and are not A-stable 
beyond second-order temporal accuracy. 

On the other hand, multistage RK schemes are self- 
starting, are easily implemented in a variable time- 
stepping mode, and can be designed with A- and In- 
stability properties for any temporal order. However, 
these schemes require multiple nonlinear solves at each 
time step, and hence have often been discounted as 
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non-competitive compared to BDF schemes. One of 
the objectives of this paper is to investigate the rela- 
tive efficiencies of BDF and RK schemes in computing 
time-dependent solutions to a given level of accuracy. 

Consider the integration of the system of ordinary 
differential equations (ODEs) represented by the equa- 
tion, 


We use the following notation, RKxy refers to an 
ESDIRK scheme which has x stages and y th order ac- 
curacy. Bijl et al. 1 have compared these schemes 
and found RK64 to perform well. The numerical val- 
ues for the coefficients of this scheme are given in the 
Appendix. More details in general about ESDIRK 
schemes can be found in. 8 


2l = S<t,w(f)) (2) 

where the vector S results from the spatial discretiza- 
tion of the equations of fluid mechanics. The general 
formula for a k - step BDF scheme can be written as: 

k~ 1 

= -£ a l w Ti+ * A A«AS B+ * (3) 

*=o 

BDF schemes require the storage of k+1 solution lev- 
els and the computation of one non-linear solution 
at each time step. For k=2, the second-order accu- 
rate (BDF2) scheme is obtained using the coefficients: 
a 0 = — 4/3,cki = 1/3 , (h = 2/3. More details of these 
standard schemes can be found in. 1,7 

Runge-Kutta methods are multistage schemes and 
are implemented as: 

S 

w k - w" + (At)^o*,S (w') ,k=l,s (4) 

3=1 

8 

w" +1 =w" + <A t)£&>S(w>) (5) 

j = l 

where s is the number of stages and a,j and bj are 
the Butcher coefficients of the scheme. Following the 
previous work by Bijl et al. 1 we focus on the ES- 
DIRK class of RK schemes, which stands for Explicit 
first stage, Single diagonal coefficient. Diagonally Im- 
plicit Runge-Kutta. The Butcher table for a six stage 
ESDIRK scheme is shown in Table T 


C\ = 0 

0 

0 

0 

0 

0 

0 

C2 

a 21 

<*66 

0 

0 

0 

0 

C3 

<*31 

<*32 

<*66 

0 

0 

0 

C 4 

<*4i 

<*42 

<*43 

<*66 

0 

0 

C5 

<*51 

<*52 

<*53 

<*54 

<*66 

0 

c© = 1 

b i 

b 2 

&3 

b 4 

h 

<*66 

W"+' 1- 

bi 

b 2 

b 3 

b 4 

bo 

<*66 


Table 1 Butcher Tableau for the ESDIRK class of 
RK schemes with number of stages, s = 6. 

In Table 1, c k denotes the point in the time inter- 
val, [t,t + At\. These schemes are characterized by 
a lower triangular form of the coefficient table, thus 
resulting in a single implicit solve at each individual 
stage. The first stage is explicit (a*i = 0) and the 
last stage coefficients take on the form a k j = bj, thus 
enabling equation (5) to be simplified as 

w" +1 = (6) 


Implicit Solution Technique 

Both BDF and RKxy schemes require the solution of 
a nonlinear system of equations. BDF schemes require 
the solution of one nonlinear equation per time step. 
In the case of BDF, a nonlinear residual, R(w), can 
be defined from equation (3) and is given by: 


R (w) = R (w n+ *) 

w n+ * (7) 

= 5L p k + SRCbdf 

At 

where the superscript on w has been dropped for the 
sake of simplicity. SRCbdf is the source term inde- 
pendent of w = w n +* and is given by, 


SRCbdf 


At 




,n+» 


Li=0 


( 8 ) 


In the case of RKxy schemes, a nonlinear equation 
arises at each stage of the time-stepping scheme and 
hence requires more than one nonlinear solve per time 
step. Again, a nonlinear residual, R(w) for each stage 
of the RKxy scheme can be defined using equation (4) 
as follows: 

R(w) = R (w*) 

k (9) 

= ^ afc*S (w fc ) + SRCrk 

A* v 

where the superscript on w has again been dropped 
for the sake of simplicity. Also, SRCrk, is the source 
term independent of w = w* and is given by, 


SRCrk = ~ ^ a fcjS (w*) 


it — i 


(10) 


3 = 1 


Hence, in both BDF and RKxy we are required to 
obtain the solution of the nonlinear system of equa- 
tions, 


R(w) = 0 (11) 

Three different methods are proposed for solving 
equation ( 11 ) and their relative performances are stud- 
ied. The three methods, in this paper, are henceforth 
referred to as : 

1. Nonlinear Multigrid (NMG) 

2. Linear Multigrid (LMG) 
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3. Preconditioned Generalized Minimal Residual 

(PGMRES) 

In NMG, a pseudo-time-stepping scheme is em- 
ployed to obtain the solution of the nonlinear system 
of equations, which is accelerated using a non-linear 
full approximation storage (FAS) agglomeration multi- 
grid method 5,6 In the other two approaches, an inexact 
Newton solution strategy 7 is used to solve the nonlin- 
ear system of equations. The arising linear system of 
equations is solved using iterative/Krylov techniques. 
To accelerate convergence the linear system is left pre- 
conditioned using an approximate inverse to the first- 
order accurate Jacobian which in itself is employed as 
an approximation to the Jacobian of the second-order 
accurate discretization. The last two approaches differ 
only in the methods used to solve the preconditioned 
linear system of equations. LMG uses the Richard- 
son’s iterative method while the PGMRES uses the 
Generalized Minimal Residual method developed by 
Yaad and Schultz. 9 

Inexact Newton’s Methods 

To solve the non-linear system of equations R (w) = 
0, Newton’s method requires the solution of a series of 
linear systems of the form, 

.^L/ wW=_r ( wW ) (i2) 

where, 

W E* +1 ] = + SwW (13) 

Let, 

x = r = R ; A = 

(14) 

Hence, equation (12) now' becomes, 

Ax = -r (15) 

Traditionally, there have been two main obstacles 
to the use of Newton’s method for large scale multi- 
physics applications : 

1. An initial guess inside of the radius of convergence 
is required for Newton's method to converge. How- 
ever, for unsteady problems, a good initial guess 
is provided by the solution at the previous time 
step. If the Newton’s method does not converge, 
then by lowering the time step one can get the 
initial guess as close as necessary to the solution 
at the next time level. In this paper, no difficul- 
ties were encountered in the convergence of the 
Newton iterations for all the time steps used. 

2. Construction and storage of the Jacobian ma- 
trix, A becomes prohibitive. This problem is 


particularly exacerbated in 3D and while using 
higher-order spatial discretizations which are not 
confined to the nearest neighbor stencils. This 
problem is overcome by the use of Jacobian- free 
methods to solve the linear system of equations. 
However, additional memory is still required to 
store the first-order Jacobian for the precondition- 
ing operation, and the various Krylov vectors for 
the GMRES scheme. On the other hand, the use 
of additional memory can be rationalized if this 
produces substantial gains in cpu time, particu- 
larly for unsteady flow' simulations where cpu time 
is the dominant concern. 

Additionally, in order to improve the computational 
efficiency of these methods, we use an inexact New- 
ton’s method 10 where the arising system of equations 
are not solved exactly. In this paper, we employ a 
very simplistic method where the number of iterations 
carried out by the underlying iterative linear solver is 
held fixed. In it’s exact form, Newton’s method pro- 
vides quadratic convergence. However due to all the 
approximations employed by the solution method in 
this paper, this rate of convergence is not achieved. 

Preconditioned Inexact Newton’s Method 

In order to achieve rapid convergence of the lin- 
ear problem at each Newton iteration, preconditioning 
methods are used to cluster the eigenvalues of the sys- 
tem. We adopt the approach of left preconditioning in 
order to achieve this desirable distribution of eigenval- 
ues. The preconditioned system can be written as: 

TV Ax = -TVr (1C) 

We make the following comments on the precondition- 
ing: 

• The preconditioner, TV, is looked upon here as 
an operator as opposed to a matrix. Hence, V V, 
may or may not be able to be w ritten as a matrix. 

• The preconditioner must be chosen as close as pos- 
sible to A' 1 so that TV A % X, where X is the 
identity operator. 

• Since, each step of the New r ton’s method, equa- 
tion (12), moves w^, towards the solution, w*, 
of R (w) = 0, any operator which produces a cor- 
rection <5wW = x to advance towards w* 
would serve as a reasonable preconditioner. 

• Based on the above fact, single or multiple cy- 
cles of the nonlinear multigrid method discussed 
in the earlier subsection could be used as a pre- 
conditioner. However, this is computationally in- 
efficient as we do not recognize the fact that we 
are now solving a system of linear equations. 

Keeping in mind the above observations we now' pro- 
pose a better preconditioner, TV* We first choose 


dR 

dw 
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TV = A" 1 , where A is a matrix approximation to the 
Jacobian, A. Considerations governing the choice of A 
would be, 

1. Storage requirements for A must riot be pro- 
hibitive . Preferably it must use less space than 
the space needed for A or else there would be no 
space gain in using Jacobian-free methods. 

2. The inverse of A must be simple to calculate or 
approximate . If onejs able to compute an ap- 
proximate inverse to A fairly easily using iterative 

methods, this new operator, TV — A would 
serve as an appropriate preconditioner. The two 
tildes are used to symbolize the fact that there 
are two approximations involved in the definition 
of the TV, 

(a) A which is an approximation to the Jacobian, 

A 

(b) An approximation in computing the inverse 
of A 

In this paper, we choose A to be the Jacobian of the 
first-order accurate, nearest neighbor discretization of 
the nonlinear set of equations. Hence, the storage of 
A requires substantially less memory than that of the 
full Jacobian A of the second-order accurate scheme. 

A" 1 is computed approximately using several iter- 
ations of a linear multigrid method. This particular 
multigrid method can be viewed as the linear ana- 
logue of the non-linear FAS agglomeration multigrid 
scheme described in the NMG method. This approach 
has been previously described in detail in reference. 0 
In this particular approach the coarse level approxima- 
tions to the Jacobian are obtained taking the Jacobian 
of the Galerkin projection of the (frozen) fine grid op- 
erator as: 

Ah = 9^7 Oh R,,| H) (17) 

as opposed to the more traditional linear multigrid 
Galerkin projection: 

= = . (18) 

where i* is the restriction operator and is the pro- 
longation operator. This approach was chosen purely 
for convenience, as the terms in equation (17) are read- 
ily available. The smoother on each grid was taken as 
a block diagonal Jacobi solver. 

Linear Multigrid (LMG) 

In the method referred to as LMG, the linear multi- 
grid preconditioned system arising at each Newton 
iteration is solved using a single Richardson iteration. 
In order to solve the system, 

TV Ax = — TV r (19) 


we define the splitting 

TV A = 1 + Af 

The resulting iterative scheme is defined as, 

Ix (m+1) = -TVr-Vx (m) 

Z«5x (m) = -TV* - ?VAx (m) 

As indicated earlier, since we are required to solve 
equation (19) only approximately, we carry out only 
a single Richardson’s iteration. Assuming x (l) = 0, 
equation (21) reduces to, 


(20) 

( 21 ) 


Jx (1 * = — T V r 

x< 2 > = x (1 U<5x* 1) =<5x< 1 > = -TV r 


( 22 ) 


Hence, we have 

<5w^ = x (2) = — TV r (23) 


Equation (23) illustrates the correspondence of this 
scheme to a Newton’s method in which the first-order 
accurate Jacobian is used along with a second-order 
accurate residual. 


Preconditioned Generalized Minimal Residual 
(PGMRES) 

Having presented the LMG scheme as a precondi- 
tioned Richardson iteration, the PGMRES scheme can 
similarly be described as the equivalent scheme ob- 
tained when the single Richardson iteration is replaced 
by a GMRES Krylov subspace iterative approach. In 
this method, we use GMRES to solve equation (16) 
in a matrix-free Newton-Krylov fashion, making use 
of the preconditioner, TV 5 which is based on a linear 
multigrid. The matrix-free implementation of PGM- 
RES requires the computation of the product, TV Ax, 
which is approximated using a first-order Taylor series 
expansion as, 


TVR (wM + ex) - TVR (wM) 
V Ax — ^ 


TVR (wM + ex) - TVr 
e 


(24) 


where x is some unit vector and e is a number chosen 
close to machine round-off. We use the restarted form 
of GMRES with a fixed number of search directions. 
While increasing the number of Krylov vectors accel- 
erates convergence, storage and cpu time increase with 
the number of search directions. The optimal number 
of search directions is therefore determined experimen- 
tally. 


Validation of the Temporal Scheme 

Numerical experiments have been performed to de- 
termine the observed order of accuracy of the vari- 
ous time-integration schemes. The test problem cho- 
sen for this purpose consists of the unsteady laminar 
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flow around a two-dimensional circular cylinder at a 
Reynolds number of 1200 and a Mach number of 0.2. 
The initial flow r is symmetric with zero lift. As the 
wake behind the cylinder starts to grow, it becomes 
unstable and begins to shed vortices from alternate 
sides of the cylinder. The computational grid is shown 
in Figure 2. The far-field boundary is a circle con- 
centric with the cylinder and diameter given by 40D 
where D is the diameter of the cylinder. A close up of 
the mesh around the cylinder is shown in Figure 3 



Fig. 2 Computational mesh for circular cylinder 



Fig. 3 Computational mesh in the region around 
the cylinder 

Time is non dimension alized as Ut/D where U is 
the free stream velocity and D is the diameter of the 
cylinder. The initial condition for the various studies 
was obtained by simulating the limit cycle behavior 
for around 10 shedding cycles using a relatively small 
time step. Using At = 0.025, the Strouhal number was 
calculated to be 0.2469. The variation of Cl with time 
is shown in Figure 4. A plot of the density contours 


at an intermediate time is shown in Figure 5. 



Fig. 4 Variation of Cl with non dimensional time 



Fig. 5 Density Contours calculated using a time 
step of 0.025 

In order to determine the order of accuracy, the test 
problem was solved using the same initial condition 
but with different time steps. The time interval of the 
study was approximately 1- shedding cycles. The so- 
lution at the end of time interval is assumed to have 
accumulated the temporal error. Integral measures 
such as lift on the body, drag due to pressure forces 
and pitching moment of the body were then compared 
as follows to determine the order of accuracy. Let Gai 
denote the integral measure being compared using a 
time step At , while G ex act denotes the exact solution. 
We do not know’ G exa ct but based on the order of accu- 
racy n of the scheme, we expect the follow ing behavior: 

Gai — Gexact + Ci (A£)” + Higher Order Terms 

(25) 

where C\ is a constant. By subtracting from equation 
(25) a similar expression for G^ and neglecting the 
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higher-order terms, we can obtain the following rela- 
tion, 

G At -G ¥ =C, (l + ^) ( At )" (2C ) 

= C 2 (At)" 

where C 2 is another constant. Equation (26) can be 
used to determine the order of accuracy of the scheme 
which can then be compared with the expected order 
of accuracy based on theory. 

The order of accuracy was verified for two ESDIRK 
schemes ( RK64 and RK43) and the second-order BDF 
(BDF2) scheme. The nonlinear systems which arose 
were converged until the maximum density correction, 

| Ap |max < 10~ 10 . This ensures that the “iteration 
error” is negligibly small relative to the discretization 
error. Figures 6, 7 and 8 show the detailed refinement 
study for the RK64, RK43 and BDF2 schemes respec- 
tively. It can be seen that all the integral measures 
yield nearly the same quantitative conclusions. The 
anomalous behavior in Figure 8 is likely due to the 
choice of a large time step. It is also seen that the 
computed order of accuracy is close to the expected 
order of accuracy. For example, Figure 6 shows that 
the computed order of accuracy for the RK64 scheme 
is 3.8938 while the expected order of accuracy is 4. 



Parameter selection in the Linear 
Multigrid (LMG) 

The inexact Newton methods contain various pa- 
rameters which must be chosen judiciously in order 
to optimize the overall run-time of these methods. For 
the LMG scheme, the parameters to be chosen include: 

1. Number of linear multigrid cycles carried out in 

Vh 

2. Number of smoothing iterations carried out on 
each grid of the multigrid. 




Fig. 8 Verifying order of accuracy of BDF 2 

Increasing either of these parameters would make TV 
a better approximation to A~ l . Figure 9 illustrates 
the effect of increasing the number of linear multigrid 
cycles used in TV* All the results shown in this section 
and the next are carried out using the BDF2 physical 
time stepping scheme and a At = 0.05. 

We observe the following : 

1. The rate of convergence slows down as the non- 
linear residual decreases. This is due to the in- 
exact solution of the linear system described by 
equation (16) at each Newton iteration, as we use 
only a single Richardson’s iteration to solve equa- 
tion (16). Hence, the quadratic convergence ex- 
pected of exact Newton’s methods is not achieved. 

2. Increasing the number of linear multigrid cycles in 
TV can only increase the non-linear convergence 
by a finite amount, and eventually asymptotes to 
a maximum rate. This is due to the fact the pre- 
conditioner TV based on a first-order Jacobian, 
w'hich even if inverted exactly does not correspond 
to the inverse of the exact Jacobian, A 
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Fig. 9 Effect of changing the number of linear 
multigrid cycles in TV of LMG 

Since, the ultimate goal is to minimize the runtime 
required to converge the equations, R(w) = 0, to a 
given tolerance level, we plot the reduction of the resid- 
ual against runtime for different choices of the number _ 
of linear multigrid cycles. It can be seen from Figure 1 



Fig. 10 Effect of changing the number of linear 
multigrid cycles in TV of LMG on the runtime 

10 that the use of 2 linear multigrid cycles in TV is 
computationally most efficient for the range of error 
tolerances to which the nonlinear equations are con- 
verged. 

Figures 11 and 12 show the effect of performing dif- 
ferent number of smoothing iterations on each grid of 
the multigrid. As expected, Figure 11 shows that the 
number of Newton iterations decreases with increase 
in the number of smoothing iterations. 

The study indicates that the use of 5 smoothing 
cycles on each grid, while using two linear multigrid 
cycles, results in a computationally optimal precondi- 
tioner, TV, for the problem under consideration. 

Parameter selection in PGMRES 

Having selected all required parameters for the lin- 
ear multigrid preconditioner, TV> the remaining pa- 
rameter to be determined in the PGMRES method 



Fig. 11 Effect of changing the number of smooth- 
ing iterations carried out on each grid of the multi- 
grid 



CPU Time 

Fig. 12 Effect of changing the number of smooth- 
ing iterations carried out on each grid of the multi- 
grid on the runtime 

is the number of search directions. Increasing the 
number of search directions provides a more accurate 
solution to the linear system, which arises at every step 
of the Newton’s method. Thus, the Inexact Newton’s 
method approaches the Exact Newton’s method as the 
number of search directions is increased. 

Figure 13 shows the effect of increasing the number 
of search directions on the number of Newton itera- 
tions required to converge to the solution of R (w) = 
0. It can seen that the effect of increasing the search 
directions is more pronounced as the nonlinear resid- 
ual becomes smaller. 

However, increasing the number of search directions 
by one incurs the following additional computational 
overhead : 

• A single evaluation of the nonlinear residual on 
the fine grid. 

• A single evaluation of the preconditioner, TV* 

• Additional matrix-vector and vector-vector prod- 
ucts required to compute the extra search direc- 
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Newton Iterations 


Fig. 13 Effect of changing the number of search 
directions in PGMRES 

tion and the optimal solution, in a larger 

Krylov subspace. 

• Additional storage for the extra search direction. 

Hence, the choice of the number of search directions 
was decided based on computational efficiency. Figure 
14 plots the variation of the nonlinear residual against 
runtime for different choices of the number of search di- 
rections. It was determined that the use of five search 
directions was computationally most efficient. 



Fig. 14 Effect of changing the number of search 
directions in PGMRES on the runtime 


Runtime comparison of different 
schemes 

In this section we examine the computational effi- 
ciency of two time-discretization schemes, RK64 and 
BDF2 as a function of accuracy levels. We simultane- 
ously investigate the relative performances of the dif- 
ferent implicit solution techniques discussed in this pa- 
per. Finally, we show that the combined improvements 
in efficiency obtained by using higher-order schemes 
and better nonlinear solvers such as LMG can result 
in up to an order of magnitude speedup in overall so- 
lution efficiency. 


In order to compare the different schemes, we com- 
pare the runtimes required to advance the solution 
from an initial time Tj to a specified final physical 
time Tf, given an error tolerance in the final solution. 
In this study, we assume the error in the lift coeffi- 
cient, Cl, to be a good measure of the integral error 
in the final solution. To measure the error in Cl, the 
numerical solution obtained using the RK64 scheme 
and At = 0.0125 was assumed to be numerically ac- 
curate with zero error. Finally, we choose 3 error 
tolerances, 10 2 , 10 -3 and 10 4 , which we deem to 
be representative of engineering error tolerances, and 
make a detailed comparison of the different schemes. 
We choose a time interval Tf — Tj = 1, noting that the 
ratios of the runtimes of the various schemes should 
remain invariant with arbitrary choices of the time in- 
terval. 

The physical time step, At, chosen to advance the 
equations to Tf depends on: 

1. The physical time-stepping scheme, in this case 
either BDF2 or RK64. 

2. The error tolerance level 

Figure 15 plots the variation of the error in Cl for 
the two different physical time-stepping schemes. In 
order to obtain Figure 15 the nonlinear equations at 
each time step were converged such that the rms error 
in the density residual was less than 10“ 10 , in order to 
avoid any contamination of the temporal error. The 
circle symbols in the Figure indicate the time steps 
used in the following study for both schemes to achieve 
the three different prescribed error tolerances. 



Fig. 15 Comparison of time steps required for 
BDF2 and RK64 schemes 

It can be seen that for the range of error tolerances 
considered BDF2 requires a much smaller time step 
than RK64. Furthermore, as RK64 is fourth-order 
accurate in time, the error decreases faster with a de- 
crease in time step, making it more efficient at lower 
error tolerance levels. 
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Scheme 

At 

NMG 

LMG 

PGMRES 


Cl Error = 

nr* 


BDF 

RK 

0.01842 

0.24193 

15.55 

121.56 

5.24 

39.52 

9.00 


Cl Error = 

10 3 


BDF 

0.005825 

12.45 

4.59 

9.01 

RK 

0.122047 

131.4 

43.84 



Cl Error = 

IQ- 4 


BDF 

0.001846 

12.49 

5.54 

8.76 

RK 

0.067285 

147.14 

44.91 



Table 2 Runtimes required for carrying out 1 
physical time step using the different nonlinear 
solvers, where time steps are chosen based on the 
given error tolerances on Cl 

Hence, the choice of the physical time-stepping 
scheme affects the overall efficiency of the simulation 
in three manners: 

1. The number of time steps required to integrate to 
T/, for a given temporal accuracy 

2. The total number of non-linear solves per time 
step 

3. The condition number of the non-linear systems 
to be solved. The non-linear systems produced 
by the RK scheme are generally more difficult to 
converge due to the larger time step involved with 
the higher-order scheme. 

We now try to quantify the efficiency gains by using 
higher-order schemes and a more efficient implicit solu- 
tion technique. A major contributor to the inefficiency 
of implicit methods is solving the nonlinear systems 
at each stage/step to inappropriate sub-iteration tol- 
erances. If the nonlinear system is solved to lower 
tolerances, the additional work does not increase the 
overall solution accuracy. We do not attempt to an- 
swer this question in any detail, but assume that it is 
sufficient to converge the nonlinear residual to 6 or- 
ders of magnitude less than the error in Cl- This 
level was determined through numerical experiments, 
by setting various convergence levels and measuring 
the final temporal error. While the ratio of residual 
convergence to error in Cl appears relatively large, 
this is an artifact of the fact that these quantities axe 
inherently scaled in different manners, since Cl repre- 
sents a global integrated quantity, as opposed to the 
average flow field residuals. 

In Table 2 we present the runtimes for the differ- 
ent combination of schemes. In Figures 16 and 17 
we show the nonlinear convergence characteristics of 
BDF2 and the second stage of RK64 for the three dif- 
ferent Cl error tolerances considered. Note that the 
required levels of convergence are more stringent for 
the smaller time steps, as shown by the bar symbol on 
each line. In the actual computations, the residuals 


were only converged to these levels. In these fig- 
ures, however, convergence down to machine accuracy 
is shown to illustrate the overall convergence behav- 
ior. These examples all utilize the nonlinear multigrid 
(NMG) scheme. 



CPU Tim© 

Fig. 16 Comparison of the convergence of the non- 
linear residual of BDF2 schemes for a single time 
step using NMG, where At’s are computed based 
on the given error tolerances on Cl 

-4 1 1 — — T . 

> C, error = IQ ,D©lta 1=0.241 93 

C, error = 1 0* Delta t=0. 1 22047 
-6 -\\ C, error = 10 .Delta ts0.067285 



-18 1 ' * * 1 

0 50 100 150 200 

CPU Time 


Fig. 17 Comparison of the convergence of the non- 
linear residual of RK64 schemes for the first stage 
of a single time step using NMG, where Af’s are 
computed based on the given error tolerances on 

C L 

We observe the following : 

1. Faster convergence for smaller error tolerances, as 
At is reduced. 

2. Slower overall convergence for RK64 relative to 
BDF2 for similar error tolerances, as larger time 
steps are used in RK64. 

3. Similar behavior is observed for the other two 
methods (LMG, PGMRES) as well. 

In Tables 3 and 4 we quantify the efficiency gains 
obtained by shifting from BDF2 to RK64 for the two 
different nonlinear solvers, NMG and LMG respec- 
tively. The numbers in the different columns are ratios 
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Ci 

Error 

A< JtA 

Runtimes da f or 
Runtime*^ 

1 time step 

Overall 

Gain 

A Ibdf 

10-2 

13.13 

0.128 

1.68 

103 

20.95 

0.095 

1.99 

Tf 

1 

o 

I— 1 

36.45 

0.085 

3.1 


Table 3 BDF2 to RK64 Speedup factors for NMG 


C L 

Error 

AlfiA 

Runtime fi A f or 
Runtime b df 
1 time step 

Overall 

Gain 

A Ibdf 

10"2 

13.13 

0.133 

1.75 

♦— * 
o 

i 

CO 

20.95 

0.105 

2.2 

10-4 

36.45 

0.123 

4.5 


Table 4 BDF2 to RK64 Speedup factors for LMG 

of the corresponding quantities used in RK and BDF 
respectively. The CPU time for a given time step is 
an order of magnitude larger for the RK schemes, and 
this ratio varies slowly with the increase in accuracy. 
This result is a function of the number of non-linear 
solutions required per time step, the relative stiffness 
of each non-linear problem, and the degree to which 
each system must be converged to maintain overall ac- 
curacy. However, the ratios of time steps between the 
two schemes is large and increases rapidly for higher 
accuracy levels, thus making the RK scheme more ef- 
ficient overall, particularly at the more stringent error 
tolerances. 

In Figure 18 we plot the variation of the nonlinear 
residual for a single time step of BDF2 using different 
nonlinear solvers as a function of the number of New- 
ton iterations for LMG and PGMRES, and as a func- 
tion of nonlinear multigrid cycles in the case of NMG. 
The inexact Newton methods exhibit faster conver- 
gence rates per iteration as compared to the nonlinear 
multigrid NMG method. Furthermore, PGMRES is 
seen to outperform LMG as well, due to the fact that 
PGMRES produces a more accurate solution of the 
preconditioned linear system, equation (19) at each 
Newton iteration. 



Fig. 18 Comparison of the convergence of the 
nonlinear residual of BDF2 schemes using the 3 dif- 
ferent nonlinear solvers 


In Figure 19 we plot the variation of the nonlinear 
residual for a single time step of BDF2 using different 
nonlinear solvers as a function of CPU time. We make 
the following observations: 

1. LMG is computationally more efficient than other 
schemes as it performs fewer nonlinear residual 
evaluations. 

2. Although PGMRES has faster convergence in 
terms of Newton Iterations, the additional cost 
of each Newton iteration outweighs the gain in 
non-linear convergence. 



Fig. 19 Comparison of the convergence of the 
nonlinear residual of BDF2 schemes using the 3 dif- 
ferent nonlinear solvers against runtime 

Finally, in Figure 20 we show the runtimes required 
to reach Tf — Ti = 1 using the different schemes, 
versus the error tolerance level in Ci. Figure 20 
clearly indicates that the commonly used combination 
of BDF2 and nonlinear multigrid (NMG) exhibits the 
lowest computational efficiency for this problem. The 
most efficient solution strategy is obtained using the 
RK64 temporal discretization with the linear multi- 
grid(LMG) solver, and these gains increase for higher 
accuracy levels. 

In Table 5, we emphasize the overall gains obtained 
in shifting from the BDF2-NMG to the RK64-LMG 
solution strategy, as a function of the temporal accu- 
racy. The gains obtained between the RK scheme and 
the BDF scheme and the gains obtained between the 
various non-linear solution strategies are multiplica- 
tive, producing a larger overall gain than for either 
method used alone. This compounded efficiency gain 
increases with solution accuracy, yielding up to an 
order of magnitude improvement for the highest ac- 
curacy considered. 


Conclusions 

Higher-order implicit multi-stage Runge-Kutta 
schemes have been shown to produce higher accuracy 
at reduced cost as compared to BDF2. Additionally, 
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Fig. 20 Runtimes required to reach 7/ = 1 us- 
ing the different schemes plotted against the error 
tolerance level in Cl 


Cl 

Error 

At/** 

ftuntimegpft r- 

Runtime/* k 
1 time step 

Overall 

Gain 


10“2 

13.13 

0.393 

5.16 

10-3 

20.95 

0.284 

5.95 

10“4 

36.45 

0.278 

10.14 


Table 5 BDF2-NMG to RK64-LMG Speedup fac- 
tors 

inexact Newton solution strategies have been shown 
to be well suited for solving the non-linear systems 
which arise from temporal discretizations at each time 
step. The efficiency gains of both approaches are mul- 
tiplicative, resulting in large potential savings when 
both methods are used in tandem. The combina- 
tion of RK64 with Linear Multigrid method (LMG) 
worked very well for the error tolerances considered. 
The preconditioned GMRES (PGMRES) algorithm 
studied in this work provided the fastest asymptotic 
convergence among all methods but was found to be 
non-competitive due to the slower initial convergence 
of the method when only partial convergence of the lin- 
ear systems is required. In cases where more accurate 
linear system solutions are required, PGMRES may 
be more competitive. Overall efficiency of the time- 
integration schemes is greatly affected by the degree 
to which the non-linear systems at each time step are 
converged. The levels of convergence adopted in this 
work were determined a posteriori, and are therefore 
not predictive. A more exact quantification of the re- 
quired convergence levels will be required in order to 
construct efficient time-dependent solution strategies. 


Future work will also include the use of temporal 
error estimation techniques coupled with dynamically 
adaptive time-step selection. 
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Appendix : Butcher Coefficients for RK64 

The Butcher coefficients for the RK64 scheme used in 
the paper are given in the table below. RK64 is a 
6 stage with scheme, with 4 th order accuracy and 5 
implicit stages. As in all ESDIRK schemes, bj = 


0 

0 

0 

0 

0 

0 

0.25 

0.25 

0 

0 

0 

0 

0.137776 

-0.055776 

0.25 

0 

0 

0 

0.144636866 

-0.2239319076 

0.4492950416 

0.25 

0 

0 

0.098258783284 

-0.59154424282 

0.8102105383 

0.28316440571 

0.25 

0 

0.15791629516 

0 

0.18675894052 

0.68056529531 

-0.275240531 

0.25 
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